{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multi-GPU Training with Caffe2 \n",
    "\n",
    "![caffe2 imagenet logo](images/imagenet-caffe2.png)\n",
    "\n",
    "For this tutorial we will explore multi-GPU training. We will show you a basic structure for using the `data_parallel_model` to quickly process a subset of the ImageNet database along the same design as the [ResNet-50 model](https://arxiv.org/abs/1512.03385). We will also get a chance to look under the hood at a few of Caffe2's C++ operators that efficiently handle your image pipeline, build a ResNet model, train on a single GPU and show some optimizations that are included with `data_parallel_model`, and finally we'll scale it up and show you how to parallelize your model so you can run it on multiple GPUs.\n",
    "\n",
    "## About the Dataset\n",
    "\n",
    "A commonly used dataset for benchmarking image recognition technologies is [ImageNet](http://image-net.org/). It is huge. It has images that cover the gamut, and they're categorized by labels so that you can create image subsets of animals, plants, fungi, people, objects, you name it. It's the focus of yearly competitions and this is where deep learning and convolutional neural networks (CNN) really made its name. During the 2012 ImageNet Large-Scale Visual Recognition Challenge a CNN demonstrated accuracy more than 10% beyond the next competing method. Going from around 75% accuracy to around 85% accuracy when every year the gains were only a percent or two is a significant accomplishment. \n",
    "\n",
    "![imagenet montage](images/imagenet-montage.jpg)\n",
    "\n",
    "So let's play with ImageNet and train our own model on a bunch of GPUs! You're going to need a lot space to host the 14 million images in ImageNet. How much disk space do you have? You should clear up about 300GB of space... on SSD. Spinning discs are so 2000. How much time do you have? With two GPUs maybe we'll be done in just under a week. Ready?\n",
    "\n",
    "![one does not simply train imagenet in a minute](images/imagenet-meme.jpg)\n",
    "\n",
    "That's way too much space and way too long for a tutorial! If you happened to have that much space and 128 GPUs on the latest NVIDIA V100's then you're super awesome and you can replicate our recent results shown below. You might even be able to train ImageNet in under an hour. Given how this performance seems to scale, **maybe YOU can train ImageNet in a minute!** Think about all of the things you could accomplish... a model for millions of hours of video? Catalogue every cat video on YouTube? Look for your doppleganger on Imgur?\n",
    "\n",
    "Instead of tons of GPUs and the full set of data, we're going to do this cooking show style. We're going to use a small batch images to train on, and show how you can scale that up. We chose a small slice of ImageNet: a set of 640 cars and 640 boats for our training set. We have 48 cars and 48 boats for our test set. This makes our database of images around 130 MB.\n",
    "\n",
    "## ResNet-50 Model Training Overview\n",
    "\n",
    "Below is an overview of what is needed to train and test this model across multiple GPUs. You see that it is generally not that long, nor is it that complicated. Some of the interactions for creating the parallelized model are handled by custom functions you have to write and we'll go over those later.\n",
    "\n",
    "1. use `brew` to create a model for training (we'll create one for testing later)\n",
    "2. create a database reader using the model helper object's `CreateDB` to pull the images\n",
    "3. create functions to run a ResNet-50 model for one or more GPUs\n",
    "3. create the parallelized model\n",
    "4. loop through the number of epochs you want to run, then for each epoch\n",
    "    * run the train model till you finish each batch of images\n",
    "    * run the test model\n",
    "    * calculate times, accuracies, and display the results\n",
    "\n",
    "## Part 1: Setup\n",
    "\n",
    "Your first assignment is to get your training and testing image database setup. We've created one for you and all you have to do run the code block below. This assumes you know how to use IPython. When we say run a code block, you can click the block and hit the Play button above or hit Ctrl-Enter on your keyboard. If this is news to you it is advisable that you start with introductory tutorials and get used to IPython and Caffe2 basics first.\n",
    "\n",
    "The code below will download a small database of boats and cars images and their labels for you if it doesn't already exist. The images were pulled from ImageNet and added to a `lmdb` format database. You can download it directly [here](https://download.caffe2.ai/databases/resnet_trainer.zip) unzip it, and change the folder locations to an NFS if that better suits your situation. The tutorial's default location is for you to place it in `~/caffe2_notebooks/tutorial_data/resnet_trainer`.\n",
    "\n",
    "You can also swap out the database with your own as long as it is in lmdb and you change the `train_data_count` and `test_data_count` variables below. For your first time just use that database we made for you.\n",
    "\n",
    "We're going to give you all the dependencies needed for the tutorial in the block below. \n",
    "\n",
    "### Task: Run the Setup Code\n",
    "Read and then run the code block below. Note what modules are being imported and where we're accessing the database. Note and troubleshoot any errors in case something is wrong with your environment. Don't worry about the `nccl` and `gloo` warning messages.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from caffe2.python import core, workspace, model_helper, net_drawer, memonger, brew\n",
    "from caffe2.python import data_parallel_model as dpm\n",
    "from caffe2.python.models import resnet\n",
    "from caffe2.proto import caffe2_pb2\n",
    "\n",
    "import numpy as np\n",
    "import time\n",
    "import os\n",
    "from IPython import display\n",
    "    \n",
    "workspace.GlobalInit(['caffe2', '--caffe2_log_level=2'])\n",
    "\n",
    "# This section checks if you have the training and testing databases\n",
    "current_folder = os.path.join(os.path.expanduser('~'), 'caffe2_notebooks')\n",
    "data_folder = os.path.join(current_folder, 'tutorial_data', 'resnet_trainer')\n",
    "\n",
    "# Train/test data\n",
    "train_data_db = os.path.join(data_folder, \"imagenet_cars_boats_train\")\n",
    "train_data_db_type = \"lmdb\"\n",
    "# actually 640 cars and 640 boats = 1280\n",
    "train_data_count = 1280\n",
    "test_data_db = os.path.join(data_folder, \"imagenet_cars_boats_val\")\n",
    "test_data_db_type = \"lmdb\"\n",
    "# actually 48 cars and 48 boats = 96\n",
    "test_data_count = 96\n",
    "\n",
    "# Get the dataset if it is missing\n",
    "def DownloadDataset(url, path):\n",
    "    import requests, zipfile, StringIO\n",
    "    print(\"Downloading {} ... \".format(url))\n",
    "    r = requests.get(url, stream=True)\n",
    "    z = zipfile.ZipFile(StringIO.StringIO(r.content))\n",
    "    z.extractall(path)\n",
    "    print(\"Done downloading to {}!\".format(path))\n",
    "\n",
    "# Make the data folder if it doesn't exist\n",
    "if not os.path.exists(data_folder):\n",
    "    os.makedirs(data_folder)\n",
    "else:\n",
    "    print(\"Data folder found at {}\".format(data_folder))\n",
    "# See if you already have to db, and if not, download it\n",
    "if not os.path.exists(train_data_db):\n",
    "    DownloadDataset(\"https://download.caffe2.ai/databases/resnet_trainer.zip\", data_folder) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Task: Check the Database\n",
    "\n",
    "Take a look at your data folder. You should find two subfolders, each of which will contain a single `data.mdb` file (or possibly also a lock file):\n",
    "1. imagenet_cars_boats_train (train for training, not locomotives!)\n",
    "2. imagenet_cars_boats_val (val for validation or testing)\n",
    "\n",
    "## Part 2: Configure the Training\n",
    "\n",
    "Below you can tinker with some of the settings for how the model will be created. One obvious setting to try is the `gpus`. By removing one or adding one you're directly impacting the amount of time it will take to run even on this small dataset.\n",
    "\n",
    "`batch_per_device` is the number of images processed at a time on each GPU. Using the default of 32 for 2 GPUs will equate to 32 images on each GPU for a total of 64 per mini-batch, so we'll go through the whole database and complete an epoch in 20 iterations. This is something you would want to adjust if you're sharing the GPU or otherwise want to adjust how much memory this training run is going to take up. You can see in the line below it being set to `32` we're adjusting the `total_batch_size` based on the number of GPUs.\n",
    "\n",
    "`base_learning_rate` and `weight_decay` will both influence training and can be interesting to change and witness the impact on accuracy or confidence is the results that are shown in the last section of this tutorial.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Configure how you want to train the model and with how many GPUs\n",
    "# This is set to use two GPUs in a single machine, but if you have more GPUs, extend the array [0, 1, 2, n]\n",
    "gpus = [0]\n",
    "\n",
    "# Batch size of 32 sums up to roughly 5GB of memory per device\n",
    "batch_per_device = 32\n",
    "total_batch_size = batch_per_device * len(gpus)\n",
    "\n",
    "# This model discriminates between two labels: car or boat\n",
    "num_labels = 2\n",
    "\n",
    "# Initial learning rate (scale with total batch size)\n",
    "base_learning_rate = 0.0004 * total_batch_size\n",
    "\n",
    "# only intends to influence the learning rate after 10 epochs\n",
    "stepsize = int(10 * train_data_count / total_batch_size)\n",
    "\n",
    "# Weight decay (L2 regularization)\n",
    "weight_decay = 1e-4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 3: \n",
    "\n",
    "### Using Caffe2 Operators to Create a CNN\n",
    "\n",
    "Caffe2 comes with `ModelHelper` which will do a lot of the heavy lifting for you when setting up a model. Throughout the docs and tutorial this may also be called a `model helper object`. The only required parameter is `name`. It is an arbitrary name for referencing the network in your workspace: you could call it tacos or boatzncarz. For example:\n",
    "\n",
    "```python\n",
    "taco_model = model_helper.ModelHelper(name=\"tacos\")\n",
    "```\n",
    "\n",
    "You should also reset your workspace if you run these parts multiple times. Do this just before creating the new model helper object.\n",
    "\n",
    "```python\n",
    "workspace.ResetWorkspace()\n",
    "```\n",
    "\n",
    "### Reading from the Database\n",
    "\n",
    "Another handy function for feeding your network with images is `CreateDB`, which in this case we need to serve as a database reader for the database we've already created. You can create a reader object like this: \n",
    "\n",
    "```python\n",
    "reader = taco_model.CreateDB(name, db, db_type)\n",
    "```\n",
    "\n",
    "### Task: Create a Model Helper Object\n",
    "Remember, we have two databases and each will have their own model, but for now we only need to create the training model for the training db. Use the Work Area below. Also, while you do this, experiment with IPython's development hooks by typing the first part of the name from the imported class or module and hitting the tab key. For example when creating the object you type: `train_model = model_helper.` and after the dot, hit \"tab\". You should see a full list of available functions. Then when you choose `ModelHelper` hit \"(\" then hit tab and you should see a full list of params. This is very handy when you're exploring new modules and their functions!\n",
    "\n",
    "### Task: Create a Reader\n",
    "We also need one reader. We have established the db location, `train_data_db`, and type, `train_data_db_type`, in \"Part 1: Setup\", so all you have to do is name it and pass in the configs. Again, `name` is arbitrary so you could call it \"kindle\" if you wanted. Use the Work Area below, and when you are finished run the code block."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# LAB WORK AREA FOR PART 3\n",
    "\n",
    "# Clear workspace to free allocated memory, in case you are running this for a second time.\n",
    "workspace.ResetWorkspace()\n",
    "\n",
    "# 1. Create your model helper object for the training model with ModelHelper\n",
    "\n",
    "\n",
    "# 2. Create your database reader with CreateDB\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 4: Image Transformations (requires Caffe2 to be compiled with opencv)\n",
    "\n",
    "Now that we have a reader we should take a look at how we're going to process the images. Since images that are found in the wild can be wildly different sizes, aspect ratios, and orientations we can and should train on as much variety as we can. ImageNet is no exception here. The average resolution is 496x387, and as interesting as that factoid might be, the bottom line is that you have a lot of variation. \n",
    "\n",
    "As the training images are ingested we would want to conform them to a standard size. The most direct process of doing so could follow a simple ingest where you transform the image to 256x256. We talked about the drawbacks of doing this in [Image Pre-Processing](Image_Pre-Processing_Pipeline.ipynb). Therefore for more accurate results, we should probably rescale, then crop. Even this approach with cropping has the drawbacks of losing some info from the original photo. What get chopped off doesn't make into the training data. If you ran the pre-processing tutorial on the image of the astronauts you will recall that some of the astronauts didn't make the cut. Where'd they go? Wash-out lane? Planet of the Apes? If your model was to detect people, then those lost astronauts would not be getting due credit when you run inference or face detection later using the model.\n",
    "\n",
    "### Introducing... the ImageInput Operator\n",
    "\n",
    "What could be seen as a loss turns into an opportunity. You can crop randomly around the image to create many deriviates of the original image, boosting your training data set, thereby adding robustness to the model. What if the image only has half a car or the front of a boat? You still want your model to be able to detect it! In the image below only the front a boat is shown and the model shows a 50% confidence in detection.\n",
    "\n",
    "![boat image](images/imagenet-boat.png)\n",
    "\n",
    "Caffe2 has a solution for this in its [`ImageInput` operator](https://github.com/caffe2/caffe2/blob/master/caffe2/image/image_input_op.h), a C++ image manipulation op that's used under the hood of several of the Caffe2 Python APIs.\n",
    "\n",
    "Here is a reference implementation:\n",
    "\n",
    "```python\n",
    "def add_image_input_ops(model):\n",
    "    # utilize the ImageInput operator to prep the images\n",
    "    data, label = model.ImageInput(\n",
    "        reader,\n",
    "        [\"data\", \"label\"],\n",
    "        batch_size=batch_per_device,\n",
    "        # mean: to remove color values that are common\n",
    "        mean=128.,\n",
    "        # std is going to be modified randomly to influence the mean subtraction\n",
    "        std=128.,\n",
    "        # scale to rescale each image to a common size\n",
    "        scale=256,\n",
    "        # crop to the square each image to exact dimensions\n",
    "        crop=224,\n",
    "        # not running in test mode\n",
    "        is_test=False,\n",
    "        # mirroring of the images will occur randomly\n",
    "        mirror=1\n",
    "    )\n",
    "    # prevent back-propagation: optional performance improvement; may not be observable at small scale\n",
    "    data = model.StopGradient(data, data)\n",
    "```\n",
    "\n",
    "* mean: remove info that's common in most images\n",
    "* std: used to create a randomization for both cropping and mirroring\n",
    "* scale: downres each image so that its shortest side matches this base resolution\n",
    "* crop: the image size we want every image to be (using random crops from the scaled down image)\n",
    "* mirror: randomly mirror the images so we can train on both representations\n",
    "\n",
    "The [`StopGradient` operator](https://caffe2.ai/docs/operators-catalogue.html#stopgradient) does no numerical computation. It is used here to prevent back propagation which isn't wanted in this network.\n",
    "\n",
    "### Task: Implement the InputImage Operator\n",
    "Use the Work Area below to finish the stubbed out function. Refer to the reference implementation for help on this task. \n",
    "\n",
    "* What happens if you don't add a mean, don't add a std, or don't mirror. How does this change your accuracy when you run it for many epochs?\n",
    "* What would happen if we didn't do StopGradient?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# LAB WORK AREA FOR PART 4\n",
    "\n",
    "def add_image_input_ops(model):\n",
    "    raise NotImplementedError # Remove this from the function stub\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 5: Creating a Residual Network\n",
    "\n",
    "Now you get the opportunity to use Caffe2's Resnet-50 creation function! During our Setup we `from caffe2.python.models import resnet`. We can use that for our `create_resnet50_model_ops` function that we still need to create and the main part of that will be the `resnet.create_resnet50()` function as described below:\n",
    "\n",
    "```python\n",
    "create_resnet50(\n",
    "    model, \n",
    "    data, \n",
    "    num_input_channels, \n",
    "    num_labels, \n",
    "    label=None, \n",
    "    is_test=False, \n",
    "    no_loss=False, \n",
    "    no_bias=0, \n",
    "    conv1_kernel=7, \n",
    "    conv1_stride=2, \n",
    "    final_avg_kernel=7\n",
    ")\n",
    "```\n",
    "\n",
    "Below is a reference implementation of the function using `resnet.create_resnet50()`.\n",
    "\n",
    "```python\n",
    "def create_resnet50_model_ops(model, loss_scale):\n",
    "    # Creates a residual network\n",
    "    [softmax, loss] = resnet.create_resnet50(\n",
    "        model,\n",
    "        \"data\",\n",
    "        num_input_channels=3,\n",
    "        num_labels=num_labels,\n",
    "        label=\"label\",\n",
    "    )\n",
    "    prefix = model.net.Proto().name\n",
    "    loss = model.Scale(loss, prefix + \"_loss\", scale=loss_scale)\n",
    "    model.Accuracy([softmax, \"label\"], prefix + \"_accuracy\")\n",
    "    return [loss]\n",
    "```\n",
    "\n",
    "### Task: Implement the forward_pass_builder_fun Using Resnet-50\n",
    "In the code block above where we stubbed out the `create_resnet50_model_ops` function, utilize `resnet.create_resnet50()` to create a residual network, then returning the loss. Refer to the reference implementation for help on this task.\n",
    "\n",
    "* Bonus points: if you take a look at the resnet class in the Caffe2 docs you'll notice a function to create a 32x32 model. Try it out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# LAB WORK AREA FOR PART 5\n",
    "\n",
    "def create_resnet50_model_ops(model, loss_scale):\n",
    "    raise NotImplementedError #remove this from the function stub\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 6: Make the Network Learn\n",
    "\n",
    "\n",
    "Caffe2 model helper object has several built in functions that will help with this learning by using backpropagation where it will be adjusting weights as it runs through iterations.\n",
    "\n",
    "* AddWeightDecay\n",
    "* Iter\n",
    "* net.LearningRate\n",
    "\n",
    "Below is a reference implementation:\n",
    "\n",
    "```python\n",
    "def add_parameter_update_ops(model):\n",
    "    model.AddWeightDecay(weight_decay)\n",
    "    iter = model.Iter(\"iter\")\n",
    "    lr = model.net.LearningRate(\n",
    "        [iter],\n",
    "        \"lr\",\n",
    "        base_lr=base_learning_rate,\n",
    "        policy=\"step\",\n",
    "        stepsize=stepsize,\n",
    "        gamma=0.1,\n",
    "    )\n",
    "    # Momentum SGD update\n",
    "    for param in model.GetParams():\n",
    "        param_grad = model.param_to_grad[param]\n",
    "        param_momentum = model.param_init_net.ConstantFill(\n",
    "            [param], param + '_momentum', value=0.0\n",
    "        )\n",
    "\n",
    "        # Update param_grad and param_momentum in place\n",
    "        model.net.MomentumSGDUpdate(\n",
    "            [param_grad, param_momentum, lr, param],\n",
    "            [param_grad, param_momentum, param],\n",
    "            momentum=0.9,\n",
    "            # Nesterov Momentum works slightly better than standard momentum\n",
    "            nesterov=1,\n",
    "        )\n",
    "```\n",
    "\n",
    "### Task: Implement the forward_pass_builder_fun Using Resnet-50\n",
    "Several of our Configuration variables will get used in this step. Take a look at the Configuration section from Part 2 and refresh your memory. We stubbed out the `add_parameter_update_ops` function, so to finish it, utilize `model.AddWeightDecay` and set `weight_decay`. Calculate your stepsize using `int(10 * train_data_count / total_batch_size)` or pull the value from the config. Instantiate the learning iterations with `iter = model.Iter(\"iter\")`. Use `model.net.LearningRate()` to finalize your parameter update operations. You can optionally update you SGD's momentum. It might not make a difference in this small implementation, but if you're gonna go big later, then you'll want to do this.\n",
    "\n",
    "Refer to the reference implementation for help on this task.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# LAB WORK AREA FOR PART 6\n",
    "\n",
    "def add_parameter_update_ops(model):\n",
    "    raise NotImplementedError #remove this from the function stub\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 7: Gradient Optimization\n",
    "\n",
    "If you run the network as is you may have issues with memory. Without memory optimization we could reduce the batch size, but we shouldn't have to do that. Caffe2 has a `memonger` function for this purpose which will find ways to reuse gradients that we created. Below is a reference implementation.\n",
    "\n",
    "```python\n",
    "def optimize_gradient_memory(model, loss):\n",
    "    model.net._net = memonger.share_grad_blobs(\n",
    "        model.net,\n",
    "        loss,\n",
    "        set(model.param_to_grad.values()),\n",
    "        # Due to memonger internals, we need a namescope here. Let's make one up; we'll need it later!\n",
    "        namescope=\"imonaboat\",\n",
    "        share_activations=False)\n",
    "```\n",
    "\n",
    "### Task: Implement memonger\n",
    "We're going to use the reference for help here, otherwise it is a little difficult to cover for the scope of this tutorial. The function is ready to go for you, but you should still soak up what's been done in this function. One of the key gotchas here is making sure you give it a namescope so that you can access the gradients you'll be creating in the next step. This name can be anything.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# LAB WORK AREA FOR PART 7\n",
    "\n",
    "def optimize_gradient_memory(model, loss):\n",
    "    raise NotImplementedError # Remove this from the function stub"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 8: Training the Network with One GPU\n",
    "\n",
    "Now that you've established be basic components to run ResNet-50, you can try it out on one GPU. Now, this could be a lot easier just going straight into the `data_parallel_model` and all of its optimizations, but to help explain the components needed and to build the helper functions to run `GPU_Parallelize`, we may as well start simple! \n",
    "\n",
    "If you're paying attention you might be wondering about the `gpus` array we made in the config and how that might throw things off. Also, when we looked at the config earlier you may have updated `gpus[0]` to have more than one GPU. That's fine. We can leave it like that for the next part because we will force our script to use just one GPU.\n",
    "\n",
    "Let's stitch together those functions from Parts 4-7 to run our residual network! Take a look at the code below, so you understand how the pieces fit together.\n",
    "\n",
    "```python\n",
    "# We need to give the network context and force it to run on the first GPU even if there are more.\n",
    "device_opt = core.DeviceOption(caffe2_pb2.CUDA, gpus[0])\n",
    "# Here's where that NameScope comes into play\n",
    "with core.NameScope(\"imonaboat\"):\n",
    "    # Picking that one GPU\n",
    "    with core.DeviceScope(device_opt):\n",
    "        # Run our reader, and create the layers that transform the images\n",
    "        add_image_input_ops(train_model)\n",
    "        # Generate our residual network and return the losses\n",
    "        losses = create_resnet50_model_ops(train_model)\n",
    "        # Create gradients for each loss\n",
    "        blobs_to_gradients = train_model.AddGradientOperators(losses)\n",
    "        # Kick off the learning and managing of the weights\n",
    "        add_parameter_update_ops(train_model)\n",
    "    # Optimize memory usage by consolidating where we can\n",
    "    optimize_gradient_memory(train_model, [blobs_to_gradients[losses[0]]])\n",
    "\n",
    "# Startup the network \n",
    "workspace.RunNetOnce(train_model.param_init_net)\n",
    "# Load all of the initial weights; overwrite lets you run this multiple times\n",
    "workspace.CreateNet(train_model.net, overwrite=True)\n",
    "```\n",
    "\n",
    "### Task: Pull It All Together & Run It!\n",
    "\n",
    "Things are getting a little hairy, so we gave you the full reference ready to go. Just run the code block below (hit ctrl-enter). Normally you might not use `overwrite=True` since that could be bad for what you're doing by accidentally erasing your earlier work, so try removing it and running the block multiple times to see what happens. Imagine the case where you have multiple networks going that have the same name. You don't want to overwrite, so you might want to start up a new workspace or modify the names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# LAB WORK AREA FOR PART 8\n",
    "\n",
    "device_opt = core.DeviceOption(caffe2_pb2.CUDA, gpus[0])\n",
    "with core.NameScope(\"imonaboat\"):\n",
    "    with core.DeviceScope(device_opt):\n",
    "        add_image_input_ops(train_model)\n",
    "        losses = create_resnet50_model_ops(train_model)\n",
    "        blobs_to_gradients = train_model.AddGradientOperators(losses)\n",
    "        add_parameter_update_ops(train_model)\n",
    "    optimize_gradient_memory(train_model, [blobs_to_gradients[losses[0]]])\n",
    "\n",
    "\n",
    "workspace.RunNetOnce(train_model.param_init_net)\n",
    "workspace.CreateNet(train_model.net, overwrite=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 8 ... part ~~2~~ Deux: Train!\n",
    "Here's the fun part where you can tinker with the number of epochs to run and mess with the display. We'll leave this for you to play with as a fait accompli since you worked so hard to get this far!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "num_epochs = 1\n",
    "for epoch in range(num_epochs):\n",
    "    # Split up the images evenly: total images / batch size\n",
    "    num_iters = int(train_data_count / total_batch_size)\n",
    "    for iter in range(num_iters):\n",
    "        # Stopwatch start!\n",
    "        t1 = time.time()\n",
    "        # Run this iteration!\n",
    "        workspace.RunNet(train_model.net.Proto().name)\n",
    "        t2 = time.time()\n",
    "        dt = t2 - t1\n",
    "        \n",
    "        # Stopwatch stopped! How'd we do?\n",
    "        print((\n",
    "            \"Finished iteration {:>\" + str(len(str(num_iters))) + \"}/{}\" +\n",
    "            \" (epoch {:>\" + str(len(str(num_epochs))) + \"}/{})\" + \n",
    "            \" ({:.2f} images/sec)\").\n",
    "            format(iter+1, num_iters, epoch+1, num_epochs, total_batch_size/dt))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 9: Getting Parallelized \n",
    "\n",
    "You get bonus points if you can say \"getting parallelized\" three times fast without messing up. You just saw some interesting numbers in the last step. Take note of those and see how things scale up when we use more GPUs. \n",
    "\n",
    "We're going to use Caffe2's `data_parallel_model` and its function called `Parallelize_GPU` to help us accomplish this task. The task to setup the parallel model, not to say it fast. Here's the spec on `Parallelize_GPU`:\n",
    "\n",
    "```python\n",
    "Parallelize_GPU(\n",
    "    model_helper_obj, \n",
    "    input_builder_fun, \n",
    "    forward_pass_builder_fun, \n",
    "    param_update_builder_fun, \n",
    "    devices=range(0, workspace.NumCudaDevices()), \n",
    "    rendezvous=None, \n",
    "    net_type='dag', \n",
    "    broadcast_computed_params=True, \n",
    "    optimize_gradient_memory=False)\n",
    "```\n",
    "\n",
    "We're not ready to just call this function though. As you can see in the second, third, and fourth input parameters, they are expecting functions to be passed to them. [More API details here.](https://caffe2.ai/doxygen-python/html/namespacedata__parallel__model.html#a1fe7262a0a66754f19998fa1603317b9) The three functions expected are:\n",
    "\n",
    "1. `input_build_fun`: adds the input operators. Note: Remember to instantiate reader outside of this function so all GPUs share same reader object. Signature:  input_builder_fun(model)\n",
    "2. `forward_pass_builder_fun`: adds the operators to the model. Must return list of loss-blob references that are used to build the gradient. Loss scale parameter is passed, as you should scale the loss of your model by 1.0 / the total number of gpus. Signature: forward_pass_builder_fun(model, loss_scale)\n",
    "3. `param_update_builder_fun`: adds operators that are run after gradient update, such as updating the weights and weight decaying. Signature: param_update_builder_fun(model)\n",
    "\n",
    "For the `input_build_fun` we're going to use the reader we created with `CreateDB` along with a function that leverages Caffe2's `ImageInput` operator. Sound familiar? You already did this in Part 4!\n",
    "\n",
    "For the `forward_pass_builder_fun` we need to have residual neural network. You already did this in Part 5!\n",
    "\n",
    "For the `param_update_builder_fun` we need a function to adjust the weights as the network runs. You already did this in Part 6! \n",
    "\n",
    "Let's stub out the `Parallelize_GPU` function with the parameters that we're going to use. Recall that in the setup we  `from caffe2.python import data_parallel_model as dpm`, so we can use `dpm.Parallelize_GPU()` to access the `Parallelize_GPU` function. First we'll stub out the three other functions to that this expects, add the params based on these functions names and our gpu count, then come back to the lab cell below to populate them with some logic and test them. Below is a reference implementation:\n",
    "\n",
    "```python\n",
    "dpm.Parallelize_GPU(\n",
    "    train_model,\n",
    "    input_builder_fun=add_image_input_ops,\n",
    "    forward_pass_builder_fun=create_resnet50_model_ops,\n",
    "    param_update_builder_fun=add_parameter_update_ops,\n",
    "    devices=gpus,\n",
    "    optimize_gradient_memory=True,\n",
    ")\n",
    "```\n",
    "\n",
    "### Task: Make Your Helper Functions\n",
    "You already did this the Parts 4 through 6 and in Part 7 you had to deal with gradient optimizations that are baked into `Parallelize_GPU`. The three helper function stubs below can be eliminated or if you want to see everything together go ahead and copy the functions there, so you can run them from the work area block below.\n",
    "\n",
    "### Task: Parallelize!\n",
    "Now you can stub out a call to `Parallelize_GPU`. Use the reference implementation above if you get stuck.\n",
    "* `model_helper_object`: created in Part 3; maybe you called it taco_model, or if you weren't copying and pasting you thoughtfully called it train_model or training_model.\n",
    "* Now pass the function name for each of the three functions you just created, e.g. `input_builder_fun=add_image_input_ops`\n",
    "* `devices`: we can pass in our `gpus` array from our earlier Setup.\n",
    "* `optimize_gradient_memory`: the default is `False` but let's set it to `True`; this takes care of what we had to do in Step 7 with `memonger`.\n",
    "* other params: ignore/don't pass anything to accept their defaults\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# LAB WORK AREA for Part 9\n",
    "\n",
    "# Reinitializing our configuration variables to accomodate 2 (or more, if you have them) GPUs.\n",
    "gpus = [0, 1]\n",
    "\n",
    "# Batch size of 32 sums up to roughly 5GB of memory per device\n",
    "batch_per_device = 32\n",
    "total_batch_size = batch_per_device * len(gpus)\n",
    "\n",
    "# This model discriminates between two labels: car or boat\n",
    "num_labels = 2\n",
    "\n",
    "# Initial learning rate (scale with total batch size)\n",
    "base_learning_rate = 0.0004 * total_batch_size\n",
    "\n",
    "# only intends to influence the learning rate after 10 epochs\n",
    "stepsize = int(10 * train_data_count / total_batch_size)\n",
    "\n",
    "# Weight decay (L2 regularization)\n",
    "weight_decay = 1e-4\n",
    "\n",
    "# Clear workspace to free network and memory allocated in previous steps.\n",
    "workspace.ResetWorkspace()\n",
    "\n",
    "# Create input_build_fun\n",
    "def add_image_input_ops(model):\n",
    "    # This will utilize the reader to pull images and feed them to the training model's helper object\n",
    "    # Use the model.ImageInput operator to load data from reader & apply transformations to the images.\n",
    "    raise NotImplementedError # Remove this from the function stub\n",
    "    \n",
    "\n",
    "# Create forward_pass_builder_fun\n",
    "def create_resnet50_model_ops(model, loss_scale):\n",
    "    # Use resnet module to create a residual net\n",
    "    raise NotImplementedError # Remove this from the function stub\n",
    "\n",
    "\n",
    "# Create param_update_builder_fun\n",
    "def add_parameter_update_ops(model):\n",
    "    raise NotImplementedError # Remove this from the function stub\n",
    "\n",
    "    \n",
    "# Create new train model\n",
    "train_model = NotImplementedError\n",
    "\n",
    "# Create new reader\n",
    "reader = NotImplementedError\n",
    "\n",
    "# Create parallelized model using dpm.Parallelize_GPU\n",
    "\n",
    "\n",
    "# Use workspace.RunNetOnce and workspace.CreateNet to fire up the train network\n",
    "workspace.RunNetOnce(train_model.param_init_net)\n",
    "workspace.CreateNet(train_model.net, overwrite=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 10: Create a Test Model\n",
    "\n",
    "After every epoch of training, we like to run some validation data through our model to see how it performs.\n",
    "\n",
    "Like training, this is another net, with its own data reader. Unlike training, this net does not perform backpropagation. It only does a forward pass and compares the output of the network with the label of the validation data.\n",
    "\n",
    "You've already done these steps once before when you created the training network, so do it again, but name it something different, like \"test\".\n",
    "\n",
    "### Task: Create a Test Model\n",
    "\n",
    "* Use `ModelHelper` to create a model helper object called \"test\"\n",
    "* Use `CreateDB` to create a reader and call it \"test_reader\"\n",
    "* Use `Parallelize_GPU` to parallelize the model, but set `param_update_builder_fun=None` to skip backpropagation\n",
    "* Use `workspace.RunNetOnce` and `workspace.CreateNet` to fire up the test network "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# LAB WORK AREA for Part 10\n",
    "\n",
    "# Create your test model with ModelHelper\n",
    "\n",
    "\n",
    "# Create your reader with CreateDB\n",
    "\n",
    "\n",
    "# Use multi-GPU with Parallelize_GPU, but don't utilize backpropagation\n",
    "\n",
    "\n",
    "# Use workspace.RunNetOnce and workspace.CreateNet to fire up the test network\n",
    "workspace.RunNetOnce(test_model.param_init_net)\n",
    "workspace.CreateNet(test_model.net, overwrite=True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get Ready to Display the Results\n",
    "At the end of every epoch we will take a look at how the network performs visually. We will also report on the accuracy of the training model and the test model. Let's not force you to write your own reporting and display code, so just run the code block below to get those features ready."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "from caffe2.python import visualize\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "def display_images_and_confidence():\n",
    "    images = []\n",
    "    confidences = []\n",
    "    n = 16\n",
    "    data = workspace.FetchBlob(\"gpu_0/data\")\n",
    "    label = workspace.FetchBlob(\"gpu_0/label\")\n",
    "    softmax = workspace.FetchBlob(\"gpu_0/softmax\")\n",
    "    for arr in zip(data[0:n], label[0:n], softmax[0:n]):\n",
    "        # CHW to HWC, normalize to [0.0, 1.0], and BGR to RGB\n",
    "        bgr = (arr[0].swapaxes(0, 1).swapaxes(1, 2) + 1.0) / 2.0\n",
    "        rgb = bgr[...,::-1]\n",
    "        images.append(rgb)\n",
    "        confidences.append(arr[2][arr[1]])\n",
    "\n",
    "    # Create grid for images\n",
    "    fig, rows = plt.subplots(nrows=4, ncols=4, figsize=(12, 12))\n",
    "    plt.tight_layout(h_pad=2)\n",
    "\n",
    "    # Display images and the models confidence in their label\n",
    "    items = zip([ax for cols in rows for ax in cols], images, confidences)\n",
    "    for (ax, image, confidence) in items:\n",
    "        ax.imshow(image)\n",
    "        if confidence >= 0.5:\n",
    "            ax.set_title(\"RIGHT ({:.1f}%)\".format(confidence * 100.0), color='green')\n",
    "        else:\n",
    "            ax.set_title(\"WRONG ({:.1f}%)\".format(confidence * 100.0), color='red')\n",
    "\n",
    "    plt.show()\n",
    "\n",
    "    \n",
    "def accuracy(model):\n",
    "    accuracy = []\n",
    "    prefix = model.net.Proto().name\n",
    "    for device in model._devices:\n",
    "        accuracy.append(\n",
    "            np.asscalar(workspace.FetchBlob(\"gpu_{}/{}_accuracy\".format(device, prefix))))\n",
    "    return np.average(accuracy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 11: Run Multi-GPU Training and Get Test Results\n",
    "You've come a long way. Now is the time to see it all pay off. Since you already ran ResNet once, you can glance at the code below and run it. The big difference this time is your model is parallelized! \n",
    "\n",
    "The additional components at the end deal with accuracy so you may want to dig into those specifics as a bonus task. You can try it again: just adjust the `num_epochs` value below, run the block, and see the results. You can also go back to Part 10 to reinitialize the model, and run this step again. (You may want to add `workspace.ResetWorkspace()` before you run the new models again.)\n",
    "\n",
    "Go back and check the images/sec from when you ran single GPU. Note how you can scale up with a small amount of overhead. \n",
    "\n",
    "### Task: How many GPUs would it take to train ImageNet in under a minute? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Start looping through epochs where we run the batches of images to cover the entire dataset\n",
    "# Usually you would want to run a lot more epochs to increase your model's accuracy\n",
    "num_epochs = 2\n",
    "for epoch in range(num_epochs):\n",
    "    # Split up the images evenly: total images / batch size\n",
    "    num_iters = int(train_data_count / total_batch_size)\n",
    "    for iter in range(num_iters):\n",
    "        # Stopwatch start!\n",
    "        t1 = time.time()\n",
    "        # Run this iteration!\n",
    "        workspace.RunNet(train_model.net.Proto().name)\n",
    "        t2 = time.time()\n",
    "        dt = t2 - t1\n",
    "        \n",
    "        # Stopwatch stopped! How'd we do?\n",
    "        print((\n",
    "            \"Finished iteration {:>\" + str(len(str(num_iters))) + \"}/{}\" +\n",
    "            \" (epoch {:>\" + str(len(str(num_epochs))) + \"}/{})\" + \n",
    "            \" ({:.2f} images/sec)\").\n",
    "            format(iter+1, num_iters, epoch+1, num_epochs, total_batch_size/dt))\n",
    "        \n",
    "        # Get the average accuracy for the training model\n",
    "        train_accuracy = accuracy(train_model)\n",
    "    \n",
    "    # Run the test model and assess accuracy\n",
    "    test_accuracies = []\n",
    "    for _ in range(test_data_count / total_batch_size):\n",
    "        # Run the test model\n",
    "        workspace.RunNet(test_model.net.Proto().name)\n",
    "        test_accuracies.append(accuracy(test_model))\n",
    "    test_accuracy = np.average(test_accuracies)\n",
    "\n",
    "    print(\n",
    "        \"Train accuracy: {:.3f}, test accuracy: {:.3f}\".\n",
    "        format(train_accuracy, test_accuracy))\n",
    "    \n",
    "    # Output images with confidence scores as the caption\n",
    "    display_images_and_confidence()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "If you enjoyed this tutorial and would like to see it in action in a different way, check Caffe2's Python examples to try a [script version](https://github.com/caffe2/caffe2/blob/master/caffe2/python/examples/resnet50_trainer.py) of this multi-GPU trainer. We also have some more info below in the Appendix and a Solutions section that you can use to run the expected output of this tutorial."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Appendix\n",
    "Here are a few things you may want to play with.\n",
    "\n",
    "### Explore the workspace and the protobuf outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "print(str(train_model.param_init_net.Proto())[:1000] + '\\n...')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Solutions\n",
    "This section below contains working examples for your reference. You should be able to execute these cells in order and see the expected output. **Note: this assumes you have at least 2 GPUs**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 1\n",
    "\n",
    "from caffe2.python import core, workspace, model_helper, net_drawer, memonger, brew\n",
    "from caffe2.python import data_parallel_model as dpm\n",
    "from caffe2.python.models import resnet\n",
    "from caffe2.proto import caffe2_pb2\n",
    "\n",
    "import numpy as np\n",
    "import time\n",
    "import os\n",
    "from IPython import display\n",
    "    \n",
    "workspace.GlobalInit(['caffe2', '--caffe2_log_level=2'])\n",
    "\n",
    "# This section checks if you have the training and testing databases\n",
    "current_folder = os.path.join(os.path.expanduser('~'), 'caffe2_notebooks')\n",
    "data_folder = os.path.join(current_folder, 'tutorial_data', 'resnet_trainer')\n",
    "\n",
    "# Train/test data\n",
    "train_data_db = os.path.join(data_folder, \"imagenet_cars_boats_train\")\n",
    "train_data_db_type = \"lmdb\"\n",
    "# actually 640 cars and 640 boats = 1280\n",
    "train_data_count = 1280\n",
    "test_data_db = os.path.join(data_folder, \"imagenet_cars_boats_val\")\n",
    "test_data_db_type = \"lmdb\"\n",
    "# actually 48 cars and 48 boats = 96\n",
    "test_data_count = 96\n",
    "\n",
    "# Get the dataset if it is missing\n",
    "def DownloadDataset(url, path):\n",
    "    import requests, zipfile, StringIO\n",
    "    print(\"Downloading {} ... \".format(url))\n",
    "    r = requests.get(url, stream=True)\n",
    "    z = zipfile.ZipFile(StringIO.StringIO(r.content))\n",
    "    z.extractall(path)\n",
    "    print(\"Done downloading to {}!\".format(path))\n",
    "\n",
    "# Make the data folder if it doesn't exist\n",
    "if not os.path.exists(data_folder):\n",
    "    os.makedirs(data_folder)\n",
    "else:\n",
    "    print(\"Data folder found at {}\".format(data_folder))\n",
    "# See if you already have to db, and if not, download it\n",
    "if not os.path.exists(train_data_db):\n",
    "    DownloadDataset(\"https://download.caffe2.ai/databases/resnet_trainer.zip\", data_folder) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# PART 1 TROUBLESHOOTING\n",
    "\n",
    "# lmdb error or unable to open database: look in the database folder from terminal and (sudo) delete the lock file and try again"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 2\n",
    "\n",
    "# Configure how you want to train the model and with how many GPUs\n",
    "# This is set to use two GPUs in a single machine, but if you have more GPUs, extend the array [0, 1, 2, n]\n",
    "gpus = [0, 1]\n",
    "\n",
    "# Batch size of 32 sums up to roughly 5GB of memory per device\n",
    "batch_per_device = 32\n",
    "total_batch_size = batch_per_device * len(gpus)\n",
    "\n",
    "# This model discriminates between two labels: car or boat\n",
    "num_labels = 2\n",
    "\n",
    "# Initial learning rate (scale with total batch size)\n",
    "base_learning_rate = 0.0004 * total_batch_size\n",
    "\n",
    "# only intends to influence the learning rate after 10 epochs\n",
    "stepsize = int(10 * train_data_count / total_batch_size)\n",
    "\n",
    "# Weight decay (L2 regularization)\n",
    "weight_decay = 1e-4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 3\n",
    "\n",
    "workspace.ResetWorkspace()\n",
    "# 1. Use the model helper to create a CNN for us\n",
    "train_model = model_helper.ModelHelper(\n",
    "    # Arbitrary name for referencing the network in your workspace: you could call it tacos or boatzncarz\n",
    "    name=\"train\",\n",
    ")\n",
    "\n",
    "\n",
    "# 2. Create a database reader\n",
    "# This training data reader is shared between all GPUs.\n",
    "# When reading data, the trainer runs ImageInputOp for each GPU to retrieve their own unique batch of training data.\n",
    "# CreateDB is inherited by ModelHelper from model_helper.py\n",
    "# We are going to name it \"train_reader\" and pass in the db configurations we set earlier\n",
    "reader = train_model.CreateDB(\n",
    "    \"train_reader\",\n",
    "    db=train_data_db,\n",
    "    db_type=train_data_db_type,\n",
    ")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 4\n",
    "\n",
    "def add_image_input_ops(model):\n",
    "    # utilize the ImageInput operator to prep the images\n",
    "    data, label = brew.image_input(\n",
    "        model,\n",
    "        reader,\n",
    "        [\"data\", \"label\"],\n",
    "        batch_size=batch_per_device,\n",
    "        # mean: to remove color values that are common\n",
    "        mean=128.,\n",
    "        # std is going to be modified randomly to influence the mean subtraction\n",
    "        std=128.,\n",
    "        # scale to rescale each image to a common size\n",
    "        scale=256,\n",
    "        # crop to the square each image to exact dimensions\n",
    "        crop=224,\n",
    "        # not running in test mode\n",
    "        is_test=False,\n",
    "        # mirroring of the images will occur randomly\n",
    "        mirror=1\n",
    "    )\n",
    "    # prevent back-propagation: optional performance improvement; may not be observable at small scale\n",
    "    data = model.net.StopGradient(data, data)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 5\n",
    "\n",
    "def create_resnet50_model_ops(model, loss_scale=1.0):\n",
    "    # Creates a residual network\n",
    "    [softmax, loss] = resnet.create_resnet50(\n",
    "        model,\n",
    "        \"data\",\n",
    "        num_input_channels=3,\n",
    "        num_labels=num_labels,\n",
    "        label=\"label\",\n",
    "    )\n",
    "    prefix = model.net.Proto().name\n",
    "    loss = model.net.Scale(loss, prefix + \"_loss\", scale=loss_scale)\n",
    "    brew.accuracy(model, [softmax, \"label\"], prefix + \"_accuracy\")\n",
    "    return [loss]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 6\n",
    "\n",
    "def add_parameter_update_ops(model):\n",
    "    brew.add_weight_decay(model, weight_decay)\n",
    "    iter = brew.iter(model, \"iter\")\n",
    "    lr = model.net.LearningRate(\n",
    "        [iter],\n",
    "        \"lr\",\n",
    "        base_lr=base_learning_rate,\n",
    "        policy=\"step\",\n",
    "        stepsize=stepsize,\n",
    "        gamma=0.1,\n",
    "    )\n",
    "    for param in model.GetParams():\n",
    "        param_grad = model.param_to_grad[param]\n",
    "        param_momentum = model.param_init_net.ConstantFill(\n",
    "            [param], param + '_momentum', value=0.0\n",
    "        )\n",
    "\n",
    "        # Update param_grad and param_momentum in place\n",
    "        model.net.MomentumSGDUpdate(\n",
    "            [param_grad, param_momentum, lr, param],\n",
    "            [param_grad, param_momentum, param],\n",
    "            # almost 100% but with room to grow\n",
    "            momentum=0.9,\n",
    "            # netsterov is a defenseman for the Montreal Canadiens, but\n",
    "            # Nesterov Momentum works slightly better than standard momentum\n",
    "            nesterov=1,\n",
    "        )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 7\n",
    "\n",
    "def optimize_gradient_memory(model, loss):\n",
    "    model.net._net = memonger.share_grad_blobs(\n",
    "        model.net,\n",
    "        loss,\n",
    "        set(model.param_to_grad.values()),\n",
    "        namescope=\"imonaboat\",\n",
    "        share_activations=False,\n",
    "        )\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 8\n",
    "\n",
    "device_opt = core.DeviceOption(caffe2_pb2.CUDA, gpus[0])\n",
    "with core.NameScope(\"imonaboat\"):\n",
    "    with core.DeviceScope(device_opt):\n",
    "        add_image_input_ops(train_model)\n",
    "        losses = create_resnet50_model_ops(train_model)\n",
    "        blobs_to_gradients = train_model.AddGradientOperators(losses)\n",
    "        add_parameter_update_ops(train_model)\n",
    "    optimize_gradient_memory(train_model, [blobs_to_gradients[losses[0]]])\n",
    "\n",
    "\n",
    "workspace.RunNetOnce(train_model.param_init_net)\n",
    "workspace.CreateNet(train_model.net, overwrite=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 8 Part Deux\n",
    "num_epochs = 1\n",
    "for epoch in range(num_epochs):\n",
    "    # Split up the images evenly: total images / batch size\n",
    "    num_iters = int(train_data_count / batch_per_device)\n",
    "    for iter in range(num_iters):\n",
    "        # Stopwatch start!\n",
    "        t1 = time.time()\n",
    "        # Run this iteration!\n",
    "        workspace.RunNet(train_model.net.Proto().name)\n",
    "        t2 = time.time()\n",
    "        dt = t2 - t1\n",
    "        \n",
    "        # Stopwatch stopped! How'd we do?\n",
    "        print((\n",
    "            \"Finished iteration {:>\" + str(len(str(num_iters))) + \"}/{}\" +\n",
    "            \" (epoch {:>\" + str(len(str(num_epochs))) + \"}/{})\" + \n",
    "            \" ({:.2f} images/sec)\").\n",
    "            format(iter+1, num_iters, epoch+1, num_epochs, batch_per_device/dt))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 9 Prep\n",
    "\n",
    "# Reinitializing our configuration variables to accomodate 2 (or more, if you have them) GPUs.\n",
    "gpus = [0, 1]\n",
    "\n",
    "# Batch size of 32 sums up to roughly 5GB of memory per device\n",
    "batch_per_device = 32\n",
    "total_batch_size = batch_per_device * len(gpus)\n",
    "\n",
    "# This model discriminates between two labels: car or boat\n",
    "num_labels = 2\n",
    "\n",
    "# Initial learning rate (scale with total batch size)\n",
    "base_learning_rate = 0.0004 * total_batch_size\n",
    "\n",
    "# only intends to influence the learning rate after 10 epochs\n",
    "stepsize = int(10 * train_data_count / total_batch_size)\n",
    "\n",
    "# Weight decay (L2 regularization)\n",
    "weight_decay = 1e-4\n",
    "\n",
    "# Reset workspace to clear out memory allocated during our first run.\n",
    "workspace.ResetWorkspace()\n",
    "\n",
    "# 1. Use the model helper to create a CNN for us\n",
    "train_model = model_helper.ModelHelper(\n",
    "    # Arbitrary name for referencing the network in your workspace: you could call it tacos or boatzncarz\n",
    "    name=\"train\",\n",
    ")\n",
    "\n",
    "# 2. Create a database reader\n",
    "# This training data reader is shared between all GPUs.\n",
    "# When reading data, the trainer runs ImageInputOp for each GPU to retrieve their own unique batch of training data.\n",
    "# CreateDB is inherited by cnn.ModelHelper from model_helper.py\n",
    "# We are going to name it \"train_reader\" and pass in the db configurations we set earlier\n",
    "reader = train_model.CreateDB(\n",
    "    \"train_reader\",\n",
    "    db=train_data_db,\n",
    "    db_type=train_data_db_type,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 9\n",
    "# assumes you're using the functions created in Part 4, 5, 6\n",
    "dpm.Parallelize_GPU(\n",
    "    train_model,\n",
    "    input_builder_fun=add_image_input_ops,\n",
    "    forward_pass_builder_fun=create_resnet50_model_ops,\n",
    "    param_update_builder_fun=add_parameter_update_ops,\n",
    "    devices=gpus,\n",
    "    optimize_gradient_memory=True,\n",
    ")\n",
    "\n",
    "workspace.RunNetOnce(train_model.param_init_net)\n",
    "workspace.CreateNet(train_model.net)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 10\n",
    "test_model = model_helper.ModelHelper(\n",
    "    name=\"test\",\n",
    ")\n",
    "\n",
    "reader = test_model.CreateDB(\n",
    "    \"test_reader\",\n",
    "    db=test_data_db,\n",
    "    db_type=test_data_db_type,\n",
    ")\n",
    "\n",
    "# Validation is parallelized across devices as well\n",
    "dpm.Parallelize_GPU(\n",
    "    test_model,\n",
    "    input_builder_fun=add_image_input_ops,\n",
    "    forward_pass_builder_fun=create_resnet50_model_ops,\n",
    "    param_update_builder_fun=None,\n",
    "    devices=gpus,\n",
    ")\n",
    "\n",
    "workspace.RunNetOnce(test_model.param_init_net)\n",
    "workspace.CreateNet(test_model.net)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 10 - display reporting setup\n",
    "%matplotlib inline\n",
    "from caffe2.python import visualize\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "def display_images_and_confidence():\n",
    "    images = []\n",
    "    confidences = []\n",
    "    n = 16\n",
    "    data = workspace.FetchBlob(\"gpu_0/data\")\n",
    "    label = workspace.FetchBlob(\"gpu_0/label\")\n",
    "    softmax = workspace.FetchBlob(\"gpu_0/softmax\")\n",
    "    for arr in zip(data[0:n], label[0:n], softmax[0:n]):\n",
    "        # CHW to HWC, normalize to [0.0, 1.0], and BGR to RGB\n",
    "        bgr = (arr[0].swapaxes(0, 1).swapaxes(1, 2) + 1.0) / 2.0\n",
    "        rgb = bgr[...,::-1]\n",
    "        images.append(rgb)\n",
    "        confidences.append(arr[2][arr[1]])\n",
    "\n",
    "    # Create grid for images\n",
    "    fig, rows = plt.subplots(nrows=4, ncols=4, figsize=(12, 12))\n",
    "    plt.tight_layout(h_pad=2)\n",
    "\n",
    "    # Display images and the models confidence in their label\n",
    "    items = zip([ax for cols in rows for ax in cols], images, confidences)\n",
    "    for (ax, image, confidence) in items:\n",
    "        ax.imshow(image)\n",
    "        if confidence >= 0.5:\n",
    "            ax.set_title(\"RIGHT ({:.1f}%)\".format(confidence * 100.0), color='green')\n",
    "        else:\n",
    "            ax.set_title(\"WRONG ({:.1f}%)\".format(confidence * 100.0), color='red')\n",
    "\n",
    "    plt.show()\n",
    "\n",
    "    \n",
    "def accuracy(model):\n",
    "    accuracy = []\n",
    "    prefix = model.net.Proto().name\n",
    "    for device in model._devices:\n",
    "        accuracy.append(\n",
    "            np.asscalar(workspace.FetchBlob(\"gpu_{}/{}_accuracy\".format(device, prefix))))\n",
    "    return np.average(accuracy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# SOLUTION for Part 11\n",
    "\n",
    "# Start looping through epochs where we run the batches of images to cover the entire dataset\n",
    "# Usually you would want to run a lot more epochs to increase your model's accuracy\n",
    "num_epochs = 2\n",
    "for epoch in range(num_epochs):\n",
    "    # Split up the images evenly: total images / batch size\n",
    "    num_iters = int(train_data_count / total_batch_size)\n",
    "    for iter in range(num_iters):\n",
    "        # Stopwatch start!\n",
    "        t1 = time.time()\n",
    "        # Run this iteration!\n",
    "        workspace.RunNet(train_model.net.Proto().name)\n",
    "        t2 = time.time()\n",
    "        dt = t2 - t1\n",
    "        \n",
    "        # Stopwatch stopped! How'd we do?\n",
    "        print((\n",
    "            \"Finished iteration {:>\" + str(len(str(num_iters))) + \"}/{}\" +\n",
    "            \" (epoch {:>\" + str(len(str(num_epochs))) + \"}/{})\" + \n",
    "            \" ({:.2f} images/sec)\").\n",
    "            format(iter+1, num_iters, epoch+1, num_epochs, total_batch_size/dt))\n",
    "        \n",
    "        # Get the average accuracy for the training model\n",
    "        train_accuracy = accuracy(train_model)\n",
    "    \n",
    "    # Run the test model and assess accuracy\n",
    "    test_accuracies = []\n",
    "    for _ in range(test_data_count / total_batch_size):\n",
    "        # Run the test model\n",
    "        workspace.RunNet(test_model.net.Proto().name)\n",
    "        test_accuracies.append(accuracy(test_model))\n",
    "    test_accuracy = np.average(test_accuracies)\n",
    "\n",
    "    print(\n",
    "        \"Train accuracy: {:.3f}, test accuracy: {:.3f}\".\n",
    "        format(train_accuracy, test_accuracy))\n",
    "    \n",
    "    # Output images with confidence scores as the caption\n",
    "    display_images_and_confidence()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TO DO:\n",
    "(or things to explore on your own to improve this tutorial!)\n",
    "* Create your own database of images\n",
    "* Explore the layers\n",
    "* Print out images of the intermediates/activations to show what's happening under the hood\n",
    "* Make some interactions between epochs (change of params to show impact)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
